class: center, middle, inverse, title-slide .title[ # ISA 444: Business Forecasting ] .subtitle[ ## 02: Introduction to
] .author[ ###
Fadel M. Megahed, PhD
Endres Associate Professor
Farmer School of Business
Miami University
@FadelMegahed
fmegahed
fmegahed@miamioh.edu
Automated Scheduler for Office Hours
] .date[ ### Spring 2023 ] --- # Quick Refresher from Last Class ✅ Describe **course motivation** and **structure**. ✅ Explain the differences between **cross sectional**, **time series** and **panel** datasets. ✅ Describe the **components of time series** datasets. ✅ Explain the **forecasting steps**. --- # Learning Objectives for Today's Class - Describe the syntax, data types, and data structures in
. - Access the help for
functions (each help file has the following components: Description, usage, arguments, value, and examples). - Utilize the project workflow in
and create
script. - Access, subset, and create `ts()` objects in
. --- # Learning
(Any Programming Language) <html> <center> <iframe src="https://giphy.com/embed/xonOzxf2M8hNu" width="480" height="270" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p> </center> </html> * 🗣 **Get hands dirty**‼️ * 📖 Documentation! Documentation! Documentation! * 🔎 (Not surprisingly) Learn to Google: what that error message means (I
a lot 😂) .footnote[ <html> <hr> </html> **Source:** Slide is based on [Kia Ora's How I Learn a Technology](https://stats220.earo.me/01-intro.html#7). ] --- class: inverse center middle # The RStudio Interface, Setup and a Project-Oriented Workflow for your Analysis --- ## RStudio Interface .center[<img src="../../figures/rstudio-interface.png" width="80%">] .footnote[ <html> <hr> </html> image credit: Stuart Lee] ??? live --- ## Setting up RStudio (do this once) .pull-left[ Go to **Tools** > **Global Options**: .center[<img src="../../figures/rstudio-setup.PNG" width="100%">] ] .pull-right[ <br> <br> <br> <br> Uncheck `Workspace` and `History`, which helps to keep
working environment fresh and clean every time you switch between projects. ] --- ## What is a Project? * Each university course is a project, and get your work organised. * A self-contained project is a folder that contains all relevant files, for example my `ISA 444/` 📂 includes: + `isa444.Rproj` + `lectures/` + `01_introduction/` * `01-intro.Rmd`, etc. + `02_introduction_to_r/` * `02_intro_r.Rmd`, etc. * All working files are **relative** to the **project root** (i.e. `isa444/`). * The project should just work on a different computer. --- ## Lets Create a `.Rproj` for Our Course
−
+
02
:
00
.pull-left-2[ 1. Click the **Project** icon on the top right corner <br> <br> <br> <br> 2. **New Directory**/**Existing Directory** > **New Project** > **Create Project** <br> <br> <br> 3. Open the project ] .pull-right-2[ .center[<img src="../../figures/rstudio-proj1.png" width = "100%">] ] --- class: inverse, center, middle #
101: Syntax, Data Types, Data Structures and Functions --- # Coding Style > .font150[Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread. <br> -- [The tidyverse style guide](https://style.tidyverse.org)] ###
style guide .pull-left[ ✅ `snake_case` ] .pull-right[ ❌ `camelCase` (Javascript) ❌ `PascalCase` (Python) ] .footnote[ <html> <hr> </html> **Source:** Slide is based on [Earo Wang's STAT 220 Slides](https://stats220.earo.me/01-intro.html#34) ] --- # Data Types: A Visual Introduction .center[<img src="https://d33wubrfki0l68.cloudfront.net/8a3d360c80da1186b1373a0ff0ddf7803b96e20d/254c6/diagrams/vectors/atomic.png" width="60%">] - To check the **type of** an object in
, you can use the function `typeof`. .footnote[ <html> <hr> </html> **Source:** The image is from [Hadley Wickham's Advanced R: Chapter 3 on Vectors](https://adv-r.hadley.nz/vectors-chap.html) ] --- count: false # Data Types: A Visual Introduction <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../../figures/legos-jbryan-types.png" alt="A visual representation of different types of atomic vectors" width="100%" /> <p class="caption">A visual representation of different types of atomic vectors</p> </div> .footnote[ <html> <hr> </html> **Source:** The images are from the excellent [lego-rstats GitHub Repository by Jenny Bryan](https://github.com/jennybc/lego-rstats#readme) ] --- # Data Types: Formal Definitions Each of the four primary types has a special syntax to create an individual value: - Logicals can be written in full (`TRUE` or `FALSE`), or abbreviated (`T` or `F`). - Doubles can be specified in decimal (`0.1234`), scientific (`1.23e4`), or hexadecimal (`0xcafe`) form. * There are three special values unique to doubles: `Inf`, `-Inf`, and `NaN` (not a number). * These are special values defined by the floating point standard. - Integers are written similarly to doubles but must be followed by `L`(`1234L`, `1e4L`, or `0xcafeL`), and can not contain fractional values. - Strings are surrounded by `"` (e.g., `"hi"`) or `'` (e.g., `'bye'`). Special characters are escaped with `\` see `?Quotes` for full details. .footnote[ <html> <hr> </html> **Source:** The content of this slide is verbatim from [Hadley Wickham's Advanced R: Chapter 3 on Vectors](https://adv-r.hadley.nz/vectors-chap.html#scalars) ] --- # Data Structures: Atomic Vector (1D) <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../../figures/legos-jbryan-types.png" alt="A visual representation of different types of atomic vectors" width="100%" /> <p class="caption">Keeping the visual representation of different types of atomic vectors in your head!!</p> </div> .footnote[ <html> <hr> </html> **Source:** The images are from the excellent [lego-rstats GitHub Repository by Jenny Bryan](https://github.com/jennybc/lego-rstats#readme) ] --- # Data Structures: 1D ➡️ 2D .center[<img src="../../figures/legos-jbryan-structures.png" width="92%">] .footnote[ <html> <hr> </html> **Source:** The images are from the excellent [lego-rstats GitHub Repository by Jenny Bryan](https://github.com/jennybc/lego-rstats#readme) ] --- # Data Structures: Lists An object contains elements of **different data types**. .center[<img src="../../figures/legos-jbryan-list.png" width="25%">] .footnote[ <html> <hr> </html> **Source:** The image is adapted from the excellent [lego-rstats GitHub Repository by Jenny Bryan](https://github.com/jennybc/lego-rstats/blob/master/lego-rstats_014.jpg) ] --- count: false # Data Structures: Lists .center[<img src="https://d33wubrfki0l68.cloudfront.net/9628eed602df6fd55d9bced4fba0a5a85d93db8a/36c16/diagrams/vectors/list.png" width="100%">] ```r lst <- list( # list constructor/creator * 1:3, # atomic double/numeric vector of length = 3 * "a", # atomic character vector of length = 1 (aka scalar) * c(TRUE, FALSE, TRUE), # atomic logical vector of length = 3 * c(2.3, 5.9) # atomic double/numeric vector of length =3 ) lst # printing the list ``` ``` ## [1] "1:3" "a" "c(TRUE, FALSE, TRUE)" ## [4] "c(2.3, 5.9)" ``` .footnote[ <html> <hr> </html> **Source:** Image is from [Hadley Wickham's Advanced R: Chapter 3 on Vectors](https://adv-r.hadley.nz/vectors-chap.html#lists) ] --- count: false # Data Structures: Lists .pull-left[ Subset by `[]` ```r lst[1] ``` ``` ## [[1]] ## [1] 1 2 3 ``` ] .pull-right[ Subset by `[[]]` ```r lst[[1]] ``` ``` ## [1] 1 2 3 ``` ] .center[<img src="../../figures/pepper.png" width="50%">] .footnote[ <html> <hr> </html> **Sources:** The slide is based on [Earo Wang's STAT 220 slides](https://stats220.earo.me/02-import-export.html#10) and image is from [Hadley Wickham's Tweet on Indexing lists in R](https://twitter.com/hadleywickham/status/643381054758363136?lang=en). ] --- # Data Structures: Matrices A matrix is a **2D data structure** made of **one/homogeneous data type.** .pull-left[ ```r x_mat = matrix( sample(1:10, size = 4), nrow = 2, ncol = 2 ) str(x_mat) # its structure? ``` ``` ## int [1:2, 1:2] 7 5 1 6 ``` ```r x_mat # printing it nicely print('-----------------') *x_mat[1, 2] # subsetting ``` ``` ## [,1] [,2] ## [1,] 7 1 ## [2,] 5 6 ## [1] "-----------------" ## [1] 1 ``` ] -- .pull-right[ ```r x_char = matrix( sample(letters, size = 12), nrow = 3, ncol =4) x_char ``` ``` ## [,1] [,2] [,3] [,4] ## [1,] "d" "e" "a" "u" ## [2,] "v" "s" "k" "z" ## [3,] "b" "n" "l" "t" ``` ```r *x_char[1:2, 2:3] # subsetting ``` ``` ## [,1] [,2] ## [1,] "e" "a" ## [2,] "s" "k" ``` ] --- # Data Structures: Data Frames .center[<img src="https://d33wubrfki0l68.cloudfront.net/9ec5e1f8982238a413847eb5c9bbc5dcf44c9893/bc590/diagrams/vectors/summary-tree-s3-2.png" width="22%">] > .font150[If you do data analysis in R, you’re going to be using data frames. A data frame is a named list of vectors with attributes for `(column)` `names`, `row.names`, and its class, “data.frame”. -- [Hadley Wickham](https://adv-r.hadley.nz/vectors-chap.html#list-array)] .footnote[ <html> <hr> </html> **Source:** Image is from [Hadley Wickham's Advanced R: Chapter 3 on Vectors](https://adv-r.hadley.nz/vectors-chap.html#list-array) ] --- count: false # Data Structures: Data Frames ```r df1 <- data.frame(x = 1:3, y = letters[1:3]) typeof(df1) # showing that its a special case of a list ``` ``` ## [1] "list" ``` ```r attributes(df1) # but also is of class data.frame ``` ``` ## $names ## [1] "x" "y" ## ## $class ## [1] "data.frame" ## ## $row.names ## [1] 1 2 3 ``` In contrast to a regular list, a data frame has **an additional constraint: the length of each of its vectors must be the same.** This gives data frames their **rectangular structure.** .footnote[ <html> <hr> </html> **Source:** Content is from [Hadley Wickham's Advanced R: Chapter 3 on Vectors](https://adv-r.hadley.nz/vectors-chap.html#list-array) ] --- count: false # Data Structures: Data Frames As noted in the creation of `df1`, columns in a data frame can be of different types. Hence, it is more widely used in data analysis than matrices. .center[<img src="../../figures/legos-jbryan-dataframe-w-text.png" width="40%">] .footnote[ <html> <hr> </html> **Source:** The image is adapted from the excellent [lego-rstats GitHub Repository by Jenny Bryan](https://github.com/jennybc/lego-rstats/blob/master/lego-rstats_014.jpg) ] --- # Data Structures: So What is a Tibble Anyway? > Tibble is a **modern reimagining of the data frame**. Tibbles are designed to be (as much as possible) **drop-in replacements for data frames** that fix those frustrations. A concise, and fun, way to summarise the main differences is that tibbles are **lazy and surly: they do less and complain more**. -- [Hadley Wickham](https://adv-r.hadley.nz/vectors-chap.html#list-array) .pull-left[[<img src="https://d33wubrfki0l68.cloudfront.net/565916198b0be51bf88b36f94b80c7ea67cafe7c/7f70b/cover.png" height="320px">](https://adv-r.hadley.nz)] To learn more about the basics of tibble, please consult the reference below: * [Data frames and tibbles (Click and read from 3.6 up to and including 3.6.5)](https://adv-r.hadley.nz/vectors-chap.html#list-array) --- # Functions A function call consists of the **function name** followed by one or more **argument** within parentheses. ```r temp_high_forecast = c(34, 36, 41, 44, 27, 32, 35) mean(x = temp_high_forecast) ``` ``` ## [1] 35.57143 ``` * function name: `mean()`, a built-in R function to compute mean of a vector * argument: the first argument (LHS `x`) to specify the data (RHS `temp_high_forecast`) .footnote[ <html> <hr> </html> **Source:** Slide is based on [Earo Wang's STAT 220 Slides](https://stats220.earo.me/01-intro.html#41) ] --- # Function Help Page Check the function's help page with `?mean` ### Class Activity > _Please take 2 minutes to investigate the help page for `mean` in R Studio._ ```r mean(x = temp_high_forecast, trim = 0, na.rm = FALSE, ...) ``` * Read **Usage** section + What arguments have default values? * Read **Arguments** section + What does `trim` do? * Run **Example** code
−
+
02
:
00
.footnote[ <html> <hr> </html> **Source:** Slide is based on [Earo Wang's STAT 220 Slides](https://stats220.earo.me/01-intro.html#42) ] --- # Function Arguments .pull-left[ ### Match by **positions** ```r mean(temp_high_forecast, 0.1, TRUE) ``` ``` ## [1] 35.57143 ``` ] .pull-right[ ### Match by **names** ```r mean(x = temp_high_forecast, trim = 0.1, na.rm = TRUE) ``` ``` ## [1] 35.57143 ``` ] .footnote[ <html> <hr> </html> **Source:** Slide is based on [Earo Wang's STAT 220 Slides](https://stats220.earo.me/01-intro.html#43) ] --- # Use Functions from Packages .pull-left[ ```r library(dplyr) dplyr::cummean(temp_high_forecast) ``` ``` ## [1] 34.00000 35.00000 37.00000 38.75000 36.40000 35.66667 35.57143 ``` ```r dplyr::first(temp_high_forecast) ``` ``` ## [1] 34 ``` ```r dplyr::last(temp_high_forecast) ``` ``` ## [1] 35 ``` ] .pull-right[ <br> <br> <br> <br> .center[ <img src="https://raw.githubusercontent.com/STATS-UOA/stats220/master/lectures/img/install-library.JPG" height="240px"> ] ] .footnote[ <html> <hr> </html> **Source:** Slide is based on [Earo Wang's STAT 220 Slides](https://stats220.earo.me/01-intro.html#44) ] --- class: inverse, center, middle #
for time series analysis --- # The *ts* Object
facilitates the work with time series data by storing it in an aptly named ts object. --- # Create a *ts* Object by asking *ChatGPT* <center> <iframe width="840" height="473" src="https://www.loom.com/embed/10576168e19845c29110d25559d938cd" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe> </center> --- # Demo 1: `Nile` & `AirPassengers` Data In class, we will create our first
script where we will examine one of these two built-in datasets. In our exploration, we will: - Examine the `typeof()` the dataset. - Examine the `class()` of the dataset. - Examine the `length()` of the dataset. - `print()` the data set and examine its `frequency()`. - Subset the data using `window()` and non ts-based sub-setting techniques. + Useful for the concept of the entire time series vs a snippet that we discussed in [Class 01](). --- # A Poll: File Paths in R <div style='position: relative; padding-bottom: 56.25%; padding-top: 35px; height: 0; overflow: hidden;'><iframe sandbox='allow-scripts allow-same-origin allow-presentation' allowfullscreen='true' allowtransparency='true' frameborder='0' height='400' src='https://www.mentimeter.com/app/presentation/aluy18c8btk2vnfyb9ehkxy351hutsqb/embed' style='position: absolute; top: 0; left: 0; width: 100%; height: 100%;' width='720'></iframe></div> --- # Demo 2: Loading Time Series Data into R - Reading CSV Files through either the `read.csv()` or `readr::read_csv()` functions. - Reading FRED and Yahoo Finance Data using the `tidyquant` package. - Preprocessing and converting the data into a time series object. ## Bonus Tricks (if time allows) - `file.choose()` -- not typically introduced by Miami faculty!! - The datapasta package
--- # Summary of Main Points By now, you should be able to do the following: - Describe the syntax, data types, and data structures in
. - Access the help for
functions (each help file has the following components: Description, usage, arguments, value, and examples). - Utilize the project workflow in
and create
script. - Access, subset, and create `ts()` objects in
. --- # Things to Do to Prepare for Our Next Class - Go over your notes, read the **references below**, and **complete** the [self-paced R tutorial](http://rstudio.fsb.miamioh.edu:3838/megahefm/isa444/spring2023/datatypes/). - Complete [Assignment 02](https://miamioh.instructure.com/courses/188655/assignments/2368786) on Canvas. .pull-left[ .center[[<img src="https://d33wubrfki0l68.cloudfront.net/b88ef926a004b0fce72b2526b0b5c4413666a4cb/24a30/cover.png" height="350px">](https://r4ds.had.co.nz)] ] .pull-right[ * [Data Visualization](https://r4ds.had.co.nz/data-visualisation.html) * [Graphics for Communication](https://r4ds.had.co.nz/graphics-for-communication.html) * [Dates and Times](https://r4ds.had.co.nz/dates-and-times.html) ]